Adversarial training has been empirically shown to be more prone to overfitting than standard training. The exact underlying reasons still need to be fully understood. In this paper, we identify one cause of overfitting related to current practices of generating adversarial samples from misclassified samples. To address this, we propose an alternative approach that leverages the misclassified samples to mitigate the overfitting problem. We show that our approach achieves better generalization while having comparable robustness to state-of-the-art adversarial training methods on a wide range of computer vision, natural language processing, and tabular tasks.
translated by 谷歌翻译
Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of BERT-like language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveal that the improved generalization is due to `more specialized' neurons. To the best of our knowledge, this is the first work to carry out a deep qualitative analysis of different methods of generating adversarial examples in adversarial training of language models.
translated by 谷歌翻译
We investigate ensemble methods for prediction in an online setting. Unlike all the literature in ensembling, for the first time, we introduce a new approach using a meta learner that effectively combines the base model predictions via using a superset of the features that is the union of the base models' feature vectors instead of the predictions themselves. Here, our model does not use the predictions of the base models as inputs to a machine learning algorithm, but choose the best possible combination at each time step based on the state of the problem. We explore three different constraint spaces for the ensembling of the base learners that linearly combines the base predictions, which are convex combinations where the components of the ensembling vector are all nonnegative and sum up to 1; affine combinations where the weight vector components are required to sum up to 1; and the unconstrained combinations where the components are free to take any real value. The constraints are both theoretically analyzed under known statistics and integrated into the learning procedure of the meta learner as a part of the optimization in an automated manner. To show the practical efficiency of the proposed method, we employ a gradient-boosted decision tree and a multi-layer perceptron separately as the meta learners. Our framework is generic so that one can use other machine learning architectures as the ensembler as long as they allow for a custom differentiable loss for minimization. We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets, extensively used in the well-known data competitions. Furthermore, we openly share the source code of the proposed method to facilitate further research and comparison.
translated by 谷歌翻译
Automatic keyword extraction (AKE) has gained more importance with the increasing amount of digital textual data that modern computing systems process. It has various applications in information retrieval (IR) and natural language processing (NLP), including text summarisation, topic analysis and document indexing. This paper proposes a simple but effective post-processing-based universal approach to improve the performance of any AKE methods, via an enhanced level of semantic-awareness supported by PoS-tagging. To demonstrate the performance of the proposed approach, we considered word types retrieved from a PoS-tagging step and two representative sources of semantic information -- specialised terms defined in one or more context-dependent thesauri, and named entities in Wikipedia. The above three steps can be simply added to the end of any AKE methods as part of a post-processor, which simply re-evaluate all candidate keywords following some context-specific and semantic-aware criteria. For five state-of-the-art (SOTA) AKE methods, our experimental results with 17 selected datasets showed that the proposed approach improved their performances both consistently (up to 100\% in terms of improved cases) and significantly (between 10.2\% and 53.8\%, with an average of 25.8\%, in terms of F1-score and across all five methods), especially when all the three enhancement steps are used. Our results have profound implications considering the ease to apply our proposed approach to any AKE methods and to further extend it.
translated by 谷歌翻译
The horseshoe prior is known to possess many desirable properties for Bayesian estimation of sparse parameter vectors, yet its density function lacks an analytic form. As such, it is challenging to find a closed-form solution for the posterior mode. Conventional horseshoe estimators use the posterior mean to estimate the parameters, but these estimates are not sparse. We propose a novel expectation-maximisation (EM) procedure for computing the MAP estimates of the parameters in the case of the standard linear model. A particular strength of our approach is that the M-step depends only on the form of the prior and it is independent of the form of the likelihood. We introduce several simple modifications of this EM procedure that allow for straightforward extension to generalised linear models. In experiments performed on simulated and real data, our approach performs comparable, or superior to, state-of-the-art sparse estimation methods in terms of statistical performance and computational cost.
translated by 谷歌翻译
We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmentation strategy to increase code-switching (CS) performance. With increasing globalization, multiple languages are increasingly used interchangeably during fluent speech. Such CS complicates traditional speech recognition and translation, as we must recognize which language was spoken first and then apply a language-dependent recognizer and subsequent translation component to generate the desired target language output. Such a pipeline introduces latency and errors. In this paper, we eliminate the need for that, by treating speech recognition and translation as one unified end-to-end speech translation problem. By training LAST with both input languages, we decode speech into one target language, regardless of the input language. LAST delivers comparable recognition and speech translation accuracy in monolingual usage, while reducing latency and error rate considerably when CS is observed.
translated by 谷歌翻译
在网上进行的虚假信息广泛传播,包括错误信息和虚假信息已成为我们高度数字化和全球化社会的主要问题。已经进行了大量研究来更好地理解在线虚假信息的不同方面,例如不同参与者的行为和传播模式,以及使用技术和社会技术手段更好地检测和预防此类信息。在线检测和揭穿虚假信息的一种主要方法是使用人类事实检查器,这些事实检查器可以通过自动化工具来帮助。尽管进行了大量研究,但我们注意到缺乏描述虚假信息和事实检查的复杂生态系统的概念模型存在很大的差距。在本文中,我们报告了此类生态系统的第一批图形模型,这些模型重点介绍了在多种情况下在线的虚假信息,包括传统媒体和用户生成的内容。拟议的模型涵盖了广泛的实体类型和关系,可以成为研究人员和从业者在线研究虚假信息以及事实检查的效果的新工具。
translated by 谷歌翻译
AI的最新进展,尤其是深度学习,导致创建新的现实合成媒体(视频,图像和音频)以及对现有媒体的操纵的创建显着增加,这导致了新术语的创建。 'deepfake'。基于英语和中文中的研究文献和资源,本文对Deepfake进行了全面的概述,涵盖了这一新兴概念的多个重要方面,包括1)不同的定义,2)常用的性能指标和标准以及3)与DeepFake相关的数据集,挑战,比赛和基准。此外,该论文还报告了2020年和2021年发表的12条与DeepFake相关的调查论文的元评估,不仅关注上述方面,而且集中在对关键挑战和建议的分析上。我们认为,就涵盖的各个方面而言,本文是对深层的最全面评论,也是第一个涵盖英语和中国文学和资源的文章。
translated by 谷歌翻译
近年来,旨在生成模仿人类语言流利性和连贯性的文本的系统能力的实质性增长。由此,已经进行了大量研究,旨在检查这些自然语言发生器(NLG)对广泛任务的潜在用途。强大的文本生成器能够令人信服地模仿人类写作的能力越来越多地提高了欺骗和其他形式的危险滥用的潜力。随着这些系统的改善,很难区分人文编写和机器生成的文本,恶意演员可以将这些强大的NLG系统利用到各种各样的目的,包括创建假新闻和错误信息,一代假货在线产品评论,或通过聊天机器人作为说服用户泄露私人信息的手段。在本文中,我们通过对NLG研究的119条类似调查的论文进行识别和检查,概述了NLG领域。从这些已确定的论文中,我们概述了构成NLG的中心概念的拟议高级分类法,包括用于开发广义NLG系统的方法,评估了这些系统的方法以及存在的流行NLG任务和存在的子任务和子任务。反过来,我们就当前的研究提供了对这些项目的概述和讨论,并提供了NLG在欺骗和检测系统中的潜在作用以抵消这些威胁的潜在作用。此外,我们讨论了NLG的更广泛挑战,包括现有文本生成系统经常表现出的偏见风险。这项工作为NLG领域的滥用潜力提供了广泛的概述,旨在提供对这一快速发展的研究领域的高级了解。
translated by 谷歌翻译
串联连接的机器人是希望在大规模灾害中的搜索和救援等限制空间中执行任务的候选人。这种机器人通常是韧带,我们假设肢体的添加可以改善移动性。然而,在设计和控制这种装置方面的挑战在于以提高移动性的方式协调高维冗余模块。在这里,我们开发了一个控制串联连接的多腿机器人的一般框架。具体地,我们结合了两种方法来构建一般的形状控制方案,其可以为各种机器人形态的有效运动提供自变形(“Gaits”)的基线模式。首先,我们从维度降低和生物步态分类方案中获取灵感,以产生身体变形和脚提升/降低的循环模式,其促进了任意基板接触图案的产生。其次,我们使用几何力学方法来促进识别这些起伏的最佳相位,以最大化速度和/或稳定性。我们的方案允许在扁平摩擦地形上的多腿机器人机车上的有效Gaits开发有多种数量的四肢(4,6,16,甚至0四肢)和身体致动能力(包括在Limbless设备上的侧壁Gaits)。通过适当协调身体波动和腿部放置,我们的框架结合了Limbless机器人(模块化)和腿机器人(移动性)的优势。我们预计我们的框架可以提供一般的控制方案,以便快速部署一般的多腿机器人,铺平往达在现实条件下遍历复杂环境的机器的方式。
translated by 谷歌翻译